o 

(N 



(N 



o 



A New Randomized Algorithm to Approximate the 
Star Discrepancy Based on Threshold Accepting 

Michael Gnewuch* Magnus Wahlstrom^ Carola Winzen^ 

January 14, 2013 

Abstract 



We present a new algorithm for estimating the star discrepancy 
of arbitrary point sets. Similar to the algorithm for discrepancy ap- 
proximation of Winker and Fang [SIAM J. Numcr. Anal. 34 (1997), 
2028-2042] it is based on the optimization algorithm threshold ac- 
. cepting. Our improvements include, amongst others, a non-uniform 

' sampling strategy which is more suited for higher-dimensional inputs, 

^ . and rounding steps which transform axis-parallel boxes, on which the 

O I discrepancy is to be tested, into critical test boxes. These critical test 

boxes provably yield higher discrepancy values, and contain the box 
' that exhibits the maximum value of the local discrepancy. We provide 

^ . comprehensive experiments to test the new algorithm. Our randomized 

algorithm computes the exact discrepancy frequently in all cases where 
this can be checked (i.e., where the exact discrepancy of the point set 
' can be computed in feasible time) . Most importantly, in higher dimen- 

sion the new method behaves clearly better than all previously known 
methods. 



1 Introduction 



^ ! Discrepancy theory analyzes the irregularity of point distributions and has 



considerable theoretical and practical relevance. There are many different 
discrepancy notions with a wide range of applications as in optimization, 
combinatorics, pseudo random number generation, option pricing, computer 
graphics, and other areas, see, e.g., the monographs [BC87, ChaOO, DPIO, 
DT97, FW94, Lem09, Mat09, Nie92, NWIO]. 

In particular for the important task of multivariate or infinite dimen- 
sional numerical integration, which arises frequently in fields such as fi- 
nance, statistics, physics or quantum chemistry, quasi-Monte Carlo algo- 
rithms relying on low-discrepancy samples have extensively been studied in 
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the last decades. For several classes of integrands the error of quasi-Monte 
Carlo approximation can be expressed in terms of the discrepancy of the 
set of sample points. This is put into a quantitative form by inequalities of 
Koksma-Hlawka- or Zaremba-type, see, e.g., [DPIO, Gnell, NWIO] and the 
literature mentioned therein. The essential point here is that a set of sample 
points with small discrepancy results in a small integration error. 

Of particular interest are the star discrepancy and the weighted star 
discrepancy, which we define below. For theoretical and practical reasons 
the weighted star discrepancy attracted more and more attention over the 
last few years, see, e.g., [DLP05, HPS08, Joe06, SJ07]. In particular, it is 
very promising for finance applications, see [SlolO]. 

Let X = (x*)^;^ be a finite sequence in the d-dimensional (half-open) 
unit cube [0, 1)'^. For y = (j/i, . . . ,yii) G [0, l]'^ let A{y,X) be the number 
of points of X lying in the d-dimensional half-open subinterval [0,j/) := 
[0,yi) X • • • X [0, and let Vy be the d-dimensional (Lebesgue) volume of 
[0, y) . We call 

dUX):= sup \Vy-lA{y,X)\ 

the L°°-star discrepancy, or simply the star discrepancy of X. 

For a subset u C {l,...,d} define : [0,1]"^ [0, l]'"',?/ ^ ivih&u- 
For a finite sequence of non-negative weights {'yu)uc{i,...,d} the weighted star 
discrepancy of X is defined by 

d;^^{X) := sup ^udW^uiX)). 

07^MC{l,...,d} 

Obviously, the star discrepancy is a special instance of the weighted star 
discrepancy. Further important discrepancy measures are, e.g., the L^-star 
discrepancies 



d*(X) :-- 




Vy--A{y,X) 
n 




1 <p < oo. 



and weighted versions thereof. In this article we focus on algorithms to 
approximate the star discrepancy, but note that these can be used as bases 
for algorithms to approximate the weighted star discrepancy. 

In many applications it is of interest to measure the quality of certain sets 
by calculating their (weighted or unweighted) star discrepancy, e.g., to test 
whether successive pseudo random numbers are statistically independent 
[Nie92] , or whether given sample sets are suitable for multivariate numerical 
integration of certain classes of integrands. As explained in [DGKP08], the 
fast calculation or approximation of the (weighted) star discrepancy would 
moreover allow efficient randomized semi-constructions of low-discrepancy 
samples of moderate size (meaning at most polynomial in the dimension 
d). Actually, there are derandomized algorithms known to construct such 
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samples deterministically [DGKP08, DGW09, DGWIO], but these exhibit 
high running times. Therefore, efficient semi-constructions would be help- 
ful to avoid the costly derandomization procedures. The critical step in 
the semi-construction is the efhcient calculation (or approximation) of the 
discrepancy of a randomly chosen set. 

The L^-star discrepancy of a given n-point set in dimension d can be 
computed with the help of Warnock's formula [War 72] with 0{d'n?) arith- 
metic operations. Heinrich and Frank provided an asymptotically faster 
algorithm using 0{n{[ogn)'^^^) operations for fixed d [FH96, Hei96]. Simi- 
larly efficient algorithms are not known for the star discrepancy (and thus 
also not for the more general weighted star discrepancy). In fact it is known 
that the problem of calculating the star discrepancy of arbitrary point sets 
is an A^P-hard problem [GSW09]. Furthermore, it was shown recently that 
it is also a M^[l]-hard problem with respect to the parameter d [GKWWll]. 
So it is not very surprising that all known algorithms for calculating the star 
discrepancy or approximating it up to a user-specified error exhibit running 
times exponential in d, see [DEM96, Gne08, ThiOla, ThiOlb]. Let us have a 
closer look at the problem: For a finite sequence X = (x*)"^^ in [0, l)'^ and 
for J G {1, . . . , d} we define 

Tj{X) = {x] h G {1, n}} and fj{X) = Tj{X) U {!}, 

and the grids 

r(X) =ri(X) X ••• xrd(X) and f(X) =fi(X) x ••• x frf(X). 
Then we obtain 

fi;„(X) = max| max {Vy--A{y,X) \ , max ( -A{y, X) - Vy]\ , 

(1) 

where A{y, X) denotes the number of points of X lying in the closed d- 
dimensional subinterval [0,y]. (For a proof see [GSW09] or [Nie72, Thm. 2].) 
Thus, an enumeration algorithm would provide us with the exact value of 
(ij^(X). But since the cardinality of the grid T{X) for almost all X is n'^, 
such an algorithm would be infeasible for large values of n and d. 

Since no efficient algorithm for the exact calculation or approximation 
of the star discrepancy up to a user-specified error is likely to exist, other 
authors tried to deal with this large scale integer programming problem by 
using optimization heuristics. In [WF97], Winker and Fang used threshold 
accepting to find lower bounds for the star discrepancy. Threshold accepting 
[DS90] is a refined randomized local search algorithm based on a similar 
idea as the simulated annealing algorithm [KGV83]. In [ThiOlb], Thiemard 
gave an integer linear programming formulation for the problem and used 
techniques as cutting plane generation and branch and bound to tackle it 
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(cf. also [GSW09]). Quite recently, Shah proposed a genetic algorithm to 
calculate lower bounds for the star discrepancy [ShalO]. 

Here in this paper we present a new randomized algorithm to approx- 
imate the star discrepancy. As the algorithm of Winker and Fang ours is 
based on threshold accepting but adds more problem specific knowledge to 
it. The paper is organized as follows. In Section 2 we describe the algorithm 
of Winker and Fang. In Section 3 we present a first version of our algorithm. 
The most important difference to the algorithm of Winker and Fang is a new 
non-uniform sampling strategy that takes into account the influence of the 
dimension d and topological characteristics of the given point set. In Section 
4 we introduce the concept of critical test boxes, which are the boxes that 
lead to the largest discrepancy values, including the maximum value. We 
present rounding procedures which transform given test boxes into critical 
test boxes. With the help of these procedures and some other modifications, 
our algorithm achieves even better results. However, this precision comes 
at at the cost of larger running times (roughly a factor two, see Table 1 
in Section 6). In Section 5 we analyze the new sampling strategy and the 
rounding procedures in more depth. We provide comprehensive numerical 
tests in Section 6. The results indicate that our new algorithm is superior 
to all other known methods, especially in higher dimensions. The appendix 
contains some technical results necessary for our theoretical analyses in Sec- 
tion 5. 

2 The Algorithm of Winker and Fang 
2.1 Notation 

In addition to the notation introduced above, we make use of the following 
conventions. 

For all positive integers m G N we put [m] := {1, . . . , m}. If r G M, let 
[rj := max{n S Z|n < r}. For the purpose of readability we sometime 
omit the [-J sign, i.e., whenever we write r where an integer is required, we 
implicitly mean [rJ . 

For general x,y G [0, 1]*^ we write x < y if Xj < Uj for all j € [d\ and, 
equivalently, x < y if xj < yj for all j £ [d]. The characteristic function l[o,a;) 
is defined on [0,1]*^ by l[o,x)(y) := 1 if y < a; and l[o,x)(y) '■= otherwise. 
We use corresponding conventions for the closed d-dimensional box [0,x]. 

For a given sequence X = (x^)^^^ in the d-dimensional unit cube [0, 1)'^, 
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we define the following functions. For all y G [0, 1]"^ we set 



5iy) := 5iy,X) ~ Vy - A{y, X) = Vy - -^,^0 .a(x''' 



k=l 



1 " 

6{y) := 5{y, X) := A{y, X)-Vy = -Y^ Mo,y] (^') " ' 



n 

k=l 

and d*{y) :=6*{y,X) := max {^(y), %)}. Then d*^{X) = maXygr(x) '^^(y) 
as discussed in the introduction. 

2.2 The algorithm of Winker and Fang 

Threshold accepting is an integer optimization heuristic introduced by Dueck 
and Scheuer in [DS90]. Althofer and Koschnik [AK91] showed that for suit- 
ably chosen parameters, threshold accepting converges to a global optimum 
if the number / of iterations tends to infinity. Winker and Fang [WF97] 
applied threshold accepting to compute the star discrepancy of a given re- 
point configuration. In the following, we give a short presentation of their 
algorithm. A flow diagram of the algorithm can be found in [WF97] . 

Initialization: The heuristic starts with choosing uniformly at random 
a starting point x'^ E r(X) and calculating 6*{x^) = inax{6{x^),S{x'^)}. Note 
that throughout the description of the algorithm, x^ denotes the currently 
used search point. 

Optimization: A number I of iterations is performed. In the t-th 
iteration, the algorithm chooses a point x"'^ uniformly at random from a 
given neighborhood J\f{x'^) of x'^ and calculates 6*{x"'''). It then computes 
A6* := - 6*{x'^). If Ad* > T for a given (non-positive) threshold 

value T, then x'^ is updated, i.e, the algorithm sets x'^ := x^^. With the help 
of the non-positive threshold it shall be avoided to get stuck in a bad local 
maximum x"^ of 6* — "local" with respect to the underlying neighborhood 
definition. The threshold value T changes during the run of the algorithm 
and ends up at zero. This should enforce the algorithm to end up at a local 
maximum of 6* which is reasonably close to d'^{X). 

Neighborhood Structure: Let us first give the neighborhood defini- 
tion used in [WF97]. For this purpose, let x E f (X) be given. Let £ < n/2 
be an integer and put k := 2i + 1. We allow only a certain number of co- 
ordinates to change by fixing a value mc E [d] and choosing mc coordinates 
ill • • • jjmc G [d] uniformly at random. For j E {ji, . . . ,jmc} we consider the 
set of grid coordinates 

A/'fcjW := {t G f,(X) I max{l,0-i(x,)-n < <p-\j) < mm{\fj{X)\,4>j\x,)+£} }, 

where (^j : [|f — > Tj{X) is the ordering of the set Tj{X), i.e., (^j(r) < 
4>j{s) for r < s. The neighborhood AA^^ (a;) of x of order k is the 
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Cartesian product 

Ni^-'^-{x) := Mk,i{x) X ... X Mk,d{x) , (2) 

where Mk,j{x) = Mk,j{x) for j £ {ji, . . . ,jmc] and Mk,j{x) = {xj] otherwise. 
Clearly, |iVf '-'^'"^(x)! < (2^ + 1)"*^ We abbreviate A/'f^(a;) := AA^'''-'^'"=(x) 
if j'l, . . . , jmc are mc coordinates chosen uniformly at random. 

Threshold values: Next, we explain how the threshold sequence is cho- 
sen in [WF97]. The following procedure is executed prior to the algorithm 
itself. Let / be the total number of iterations to be performed by the algo- 
rithm and let /c < n and mc < c? be fixed. For each t G [\//], the procedure 
computes a pair {y^,y^), where ?/* S r(X) is chosen uniformly at random 
and € J^^^{y^), again chosen uniformly at random. It then calculates 
the values T{t) := -\d*{y^) - <5*(y*)|. When all values T{t), t = l,...,Vl, 
have been computed, the algorithm sorts them in increasing order. For a 
given a G (0.9,1], the a^/l values closest to zero are selected as threshold 
sequence. The number J of iterations performed for each threshold value is 
J = a-^Vl- 

3 A First Improved Algorithm — TA_basic 

Our first algorithm, TA_basic, builds on the algorithm of winker and Fang 
as presented in the previous section. A preliminary, slightly different version 
of TA_basic can be found in [Win07]. This version was used in [DGWIO] to 
provide lower bounds for the comparison of the star discrepancies of different 
point sequences. In particular in higher dimensions it performed better than 
any other method tested by the authors. 

Recall that the algorithm of winker and Fang employs a uniform prob- 
ability distribution on T(X) and the neighborhoods Mj!^'^{x) for all random 
decisions. 

Firstly, this is not appropriate for higher-dimensional inputs: In any 
dimension d it is most likely that the discrepancy of a set X is caused by 
test boxes with volume at least c, c some constant in (0, 1). Thus in higher 
dimension d we expect the upper right corners of test boxes with large local 
discrepancy to have coordinates at least c^/'^. Thus it seems appropriate for 
higher dimensional sets X to increase the weight of those points in the grid 
T[X) with larger coordinates whereas we decrease the weight of the points 
with small coordinates. 

Secondly, a uniform probability distribution does not take into account 
the topological characteristics of the point set X as, e.g., distances between 
the points in the grid T[X): If there is a grid cell [x,y] in f (X) (i.e., x,y £ 
T{X) and (pJ^iVj) = 'PJ^ + ^ for all j G [d], where (pj is again the ordering 
of the set Tj{X)) with large volume, we would expect that 6{x) or 5{y) are 
also rather large. 
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Thus, on the one hand, it seem better to consider a modified probabiUty 
measure on T{X) which accounts for the influence of the dimension and 
the topological characteristics of X. On the other hand, if n and d are 
large, we clearly cannot afford an elaborate precomputation of the modified 
probability weights. 

To cope with this, the non-uniform sampling strategy employed by 
TA_basic consists of two steps: 

• A continuous sampling step, where we select a point in the whole d- 
dimensional unit cube (or in a "continuous" neighborhood of x'^) with 
respect to a non-uniform (continuous) probability measure vr"^, which 
is more concentrated in points with larger coordinates. 

• A rounding step, where we round the selected point to the grid T{X). 

In this way we address both the influence of the dimension and the topo- 
logical characteristics of the point set X. This works without performing 
any precomputation of probability weights on T{X) - instead, the random 
generator, the change of measure on [0, 1]'^ from the d-dimensional Lebesgue 
measure to vr*^, and our rounding procedure do this implicitly! Theoretical 
and experimental justifications for our non-uniform sampling strategy can 
be found in Section 5 and Section 6. 

3.1 Sampling of Neighbors 

In the following, we present how we modify the probability distribution over 
the neighborhood sets. Our non-uniform sampling strategy consists of the 
following two steps. 

Continuous Sampling Consider a point x G r(X). For fixed mc E [d] 
let ji, . . . ,jmc G [d] be pairwise different coordinates. For j G {ji, . . . ,jmc} 
let ipj : [|fj(X)U{0}|] fj(X)U{0} be the ordering of the set fj(X)U{0} 
(in particular (pj{l) = 0). Let us now consider the real interval Ckj{x) := 
[Cixj),r]{xj)] with 

C{xj) := ip{max{l,(f~^{xj) - £}) and r/(xj) := (f{mm{\fj{X)U {0}\,(p~'^{xj) + £}) . 

Our new neighborhood C^^''"'''""'(x) of x of order k is the Cartesian product 

Cf '••••^-(x) := Ck,iix) X ... X Ck,d^) , (3) 

where Ck,j{x) = Ck,j{x) for j £ {ji, . . . ,jrnc} and Ck,j{x) = {xj] otherwise. 
We abbreviate C]^'^{x) := C^.^ (x) if ji, . . . , jmc are mc coordinates cho- 
sen uniformly at random. 

Instead of endowing C^^ (x) with the Lebesgue measure on the non- 
trivial components, we choose a different probability distribution which we 
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describe in the foUowing. First, let us consider the polynomial product 
measure 



7r"'( dx) = ^j^Jixj) A( dxj) with density function / : [0, 1] M , r H> dr'^~^ 

on [0, l]'^; here A = should denote the one-dimensional Lebesgue measure. 
Notice that in dimension d = 1 we have = A. Picking a random point 
y G [0, l]*^ with respect to the new probability measure tt'^ can easily be 

done in practice by sampling a point z G [0, l]'' with respect to A"^ and then 

, / i/d i/d-. 
puttmg y := (z/ , . . . ,zj ). 

We endow (x) with the probability distribution induced 

by the polynomial product measure on the mc non-trivial components 

Ck,ji{x), . . . ,Ck,j^X^)- To be more explicit, we map each Ckj{x), j G 

{ji, • • • ,jmc}, to the unit interval [0, 1] by 



: Ck,j{x) ^ [0,l],r ^ 



r 



d 



{rj{x,)r - iCixj)) 



d • 



Recall that (,{xj) := mmCkj{x) and ??(xj) := inaxCkj{x). The inverse 
mapping ^'J^ is then given by 



: [0, 1] ^ Ck, , s ^ (((r?(x,))'^ - {C{x,)r)s + (^(x,))'^) 



i/d 



If we want to sample a random point y G Cj}'"''''"^''{x), we randomly choose 
scalars si,...,Smc in [0,1] with respect to A and put yj. := ^~^{si) for 
i = 1, . . . , mc. For indices j ^ {ji, . . . ,jmc} we set yj ■= Xj. 

Rounding Procedure: We round the point y once up and once down 
to the nearest points and y^ in T{X). More precisely, for all j G [d], let 
yj' := min{x* G f | yj < x*}. If yj > minf we set yj := max{x*- G 
rj(X) I yj > X*} and in case yj < minrj(X), we set yJ := maxrj(X). 

Obviously, A{y+,X) = Aiy,X) and thus, 6{y+) = Vy+ - A{y+,X) > 
Vy — A{y,X) = 6{y) . Similarly, if yj > minrj(X) for all j G [d], we have 
A{y-,X) = A{y,X). Hence, 6{y-) = A{y~-,X) - Vy- > A{y,X) - Vy = 
5{y) . If yj < minrj(X) for at least one j G [d] we have 5{y) < since 
A{y, X) = 0. But we also have A{y, X) = and thus 6*{y) = 5{y) < 5{y+) . 
Putting everything together, we have shown that max{(5(y"''), > 

Since it is only of insignificant additional computational cost to also 
compute 6{y~'~) where yj'~ ■= yJ for all j G [d\ with yj > minrj(X) and 
y~'~ := m.m.T j{X) for j with yj < minr-,(X), we also do that in case at 
least one such j with yj < inmTj{X) exists. 

For sampling a neighbor x^^ of x'^ the algorithm thus does the following. 
First, it samples mc coordinates ji,---,jmc £ [d] uniformly at random. 
Then it samples a point y G Cf^''"'-''"''(x'^) as described above, computes the 
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rounded grid points , y~ , and and computes the discrepancy 6^{y) := 
max{6{y~^), 5{y~),5{y~'~)} of the rounded grid points. The subscript F shall 
indicate that we consider the rounded grid points. As in the algorithm of 
Winker and Fang, TA_basic updates rr^ if and only if A6* = 6^{y) — 6* (x^) > 
T, where T denotes the current threshold. In this case we always update x'^ 
with the best rounded test point, i.e., we update x'^ := y~^ if 5p(y) = S{y~^), 
x'^ := y~ if 5p(y) = S{y~), and x'^ := y~~ otherwise. 

3.2 Sampling of the Starting Point 

Similar to the probability distribution on the neighborhood sets, we sample 
the starting point x'^ as follows. First, we sample a point x from [0, l]'^ 
according to n'^. We then round x up and down to x~^ , x~ , and x~~, 
respectively and again we set x'^ := x~^ if S^{x) = 5{x~^), x'^ := x~ if S^{x) = 
S{x~), and we set x'^ := x~'~ otherwise. 

3.3 Computation of Threshold Sequence 

The modified neighborhood sampling is also used for computing the sequence 
of threshold values. If we want the algorithm to perform / iterations, we 
compute the threshold sequence as follows. For each t G [vT] we sample 
a pair {y*,y*), where S T{X) is sampled as is the starting point and 
G f (X) is a neighbor of y*, sampled according to the procedure described 
in Section 3.1. The thresholds —\5*{y^) — S*{y*)\ are sorted in increasing 
order and each threshold will be used for y/l iterations of TA_basic. Note 
that by this choice, we are implicitly setting a := 1 in the notion of the 
algorithm of Winker and Fang. 

4 Further Improvements — Algorithm TA_improved 

In the following, we present further modifications which we applied to the ba- 
sic algorithm TA_basic. We call the new, enhanced algorithm TA_improved. 

The main improvements, which we describe in more detail below, are 
(i) a further reduction of the search space by introducing new rounding 
procedures ("snapping"), (ii) shrinking neighborhoods and growing number 
of search directions, and (iii) separate optimization of 6 and 5 . 

4.1 Further Reduction of the Search Space 

We mentioned that for calculating the star discrepancy it is sufficient to test 
just the points y G r(X) and to calculate S*{y), cf. equation (1). Therefore 
r(X) has been the search space we have considered so far. But it is possible 
to reduce the cardinality of the search space even further. 
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We obtain the reduction of the search space via a rounding procedure 
which we cah snapping. As this is an important element in the modified 
algorithm, we now discuss the underlying concept of critical points (or test 
boxes). For y S [0, l]*^ we define 

j-i d 
Sjiy) ■■= Vi) X {%■} X J] [0, yfc) , j = 1, . . . , d . 

i=l k=j+l 

We say that Sj{y) is a 6{X)-critical surface if Sj{y) H {x^, . . . jx"} 7^ or 
yj = 1. We call y a 5[X)-critical point if for all j G [d\ the surfaces Sj{y) 
are critical. Let C denote the set of (5(X)-critical points in [0, 1]*^. 
Let Sj{y) be the closure of Sj{y), i.e., 

i-i d 
SM ■■= HiO, Vi] X {yj} X H [0,yk], j = l,...,d. 

1=1 k=j+l 

We say Sj{y) is a 5{X) -critical surface if H {xi, . . . , x„} / 0. If for all 
j G [d] the surfaces Sj{y) are (5(X)-critical, then we call y a 5{X)-critical 
point. Let C denote the set of ^(X)-critical points in [0,1]'^. We call y a 
6* {X)- critical point if y € C* := C U C 

For j G [c?] let i/^ := |f and let again : — f denote the 

ordering of Tj{X). Let $ : [z^i] x • • • x [z/(i] — )• r(X) be the mapping with 
components (pj, j = I, . . . ,d. We say that a multi-index (ii, . . . , i^) E 
is a 5{X)-critical multi-index if ^>(ii, . . . , id) is a 5(X)-critical point. We use 
similar definitions in cases where we deal with 5{X) or 5*{X). 

Lemma 4.1. Let X = {x^ , . . . , x"} he a n-point configuration in [0, 1)*^. Let 
C = C{X), C = C{X), and C* = C*{X) be as defined above. Then C, C and 
C* are non-empty subsets ofT(X). Furthermore, 

sup 6{y) = max6{y) , sup 6{y) = max 6 (y) and sup 6* (y) = max 6* (y) 
ye[o,i]'' y6[0,i]d yeC y(i[Q,iY 

Proof. The set C is not empty, since it contains the point (1, . . . , 1). Let 
y & C. By definition, we find for all j G [d] an index a{j) G [n] with 
yj = x'^^^^ or we have Vj = 1. Therefore y E T{X). Let z G [0,1]*^ \ C. 
Since 5{z) = if Zj = for any index j, we may assume Zj > for all 
j. As z ^ C there exists a j £ [d] where Sj{z) is not (^(X)-critical. In 
particular, we have zj < 1. Let now r G Fjl^) be the smallest value with 
Zj < T. Then the point z := [zi, . . . , Zj-i,T, Zj^i, . . . , z^) fulfills Vz > Vz- 
Furthermore, the sets [0, z) \ [0, z) and X are disjoint. So [0, i) and [0, z) 
contain the same points of X. In particular we have A{z^ X) = ^(z, X) and 
thus, 5(z) > b{z). This argument verifies supj^gjgjjd (^(y) = maxygc<^(y)- 
The remaining statements of Lemma 4.1 can be proven with similar simple 
arguments. □ 
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We now describe how to use this concept in our algorithm. Let us 
first describe how we sample a neighbor x"^ of a given point x'^. The 
procedure starts exactly as described in Section 3.1. That is, we first 
sample mc coordinates ji,---,jmc S [d] uniformly at random. Next, we 
sample y G Cj}'"'''''^''{x^) according to the probability distribution induced 
by the polynomial product measure vr'^ on the non-trivial components of 
(^ji>---jmc^^c^^ cf. Section 3.1. Again we round y up and down to the near- 
est grid points y"'", y~ and respectively. We then apply the following 
snapping procedures^. 

Snapping down. We aim at finding a ^(X)-critical point < y~ 

such that the closed box [0, y^'*"]^^;^ contains exactly the same points of X 
as the box [0, y^]^^^. We achieve this by simply setting for all j € [d] 

:= max{a;* | i G [n],x* G [0,y~]} . 

From the algorithmic perspective, we initialize y~''^^ ;= (0, . . . , 0) and check 
for each index i G [n] whether G [0,y~]. If so, we check for all j G [d\ 
whether x*- < y^"'*" and update yj'^^ '■= x^j otherwise. 

Snapping up^. Whereas snapping down was an easy task to do, the 
same is not true for snapping up, i.e., rounding a point to a (5(X)-critical 
one. More precisely, given a point y~^, there are multiple 5(X)-critical points 
y+,sn y y+ g^g}^ that the open box created by contains only those 

points which are also contained in [0, y"*"). 

Given that we want to perform only one snapping up procedure per 
iteration, we use the following random version of snapping upwards. In 
the beginning, we initialize y+'*" := (1,...,1). Furthermore, we pick a 
permutation a of [d] uniformly at random from the set of all permutations 
of set [d]. For each point x G {x* | i G [n]} we now do the following. If 
X G [0,y~^) or Xj > yp^^ for at least one j G [d], we do nothing. Otherwise 
we update y^^*" := ^ctO) ^or the smallest j G [d] with x^(j) > y^^^jy After 
this update, x is no longer inside the open box generated by y"*"'*". 

Note that snapping up is subject to randomness as the (5(X)-critical 
point obtained by our snapping procedure can be different for different per- 
mutations a & Sd- 

The complexity of both snapping procedures is of order 0{nd). In our 
experiments, the snapping procedures caused a delay in the (wall clock) 
running time by a factor of approximately two, if compared to the running 
time of TA_basic. It is not difficult to verify the following. 

Lemma 4.2. Let X he a given n-point sequence in [0, 1)*^ For all y G [0, 1]"^, 

^The snapping procedure is the same for and y^'^ ■ Therefore, we describe it for 
y~ only. 

^Being aware that "snapping up" is an oxymoron, we still use this notation as it eases 
readability in what follows. 
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the point y"^'*", computed as described above, is 6{X)-critical and both y '^^ 
and y~'~'^"- are 6{X)- critical. 

In the run of the algorithm we now do the following. Given that we 
start in some grid point x*^, we sample y G C^^ix^) and we round y to the 
closest grid points y~^ ,y~ ,y~'~ £ f (X) as described in Section 3.1. Next we 
compute the (5(X)-critical point y"'''*", the (5(X)-critical point y~'^", and, if 
y^ 7^ y ' 5 we also compute the 5(X)-critical point y~'~'^^. We decide to 
update x'^ if A5* = > T, where T is the current threshold, 

§*,sn^y^ := max{(5(y+'^"),^(y"'^"), and is the value as 

was computed in the iteration where j;^ was updated last. Note that we 
do not update x"^ with any of the critical points y+>''", y~'^", or y~~'^'^ 
but only replace x'^ with the simple rounded grid points y'^ , y~, or y~'~, 
respectively. More precisely, we update x^ := y^ if 5*''^"(y) = 5{y^''^^), 
x^ := y~ if 6*'^^{y) = and x'^ := y~'~, otherwise. 

4.1.1 Computation of the Starting Point and the Threshold Se- 
quence 

When computing the starting point x'^ we first sample a random point x 
from [0, l]'^ according to tt'^ (see Section 3.1) and compute x~^ and x~ , and, if 
applicable, x~'~ . We also compute the S{X)- and 5(X)-critical points x"*"'*"", 
x~'^"', and and set 6*'''^"'{x) := max{(5(a;+'*"), 

We put x'^ := x~^ if 6*'^'^{x) = 5(x+''*"), x'^ := x~ if 6*''^"-{x) = 6{x~'^^), and 
x^ := x~'~ , otherwise. 

For computing the threshold sequence, we also use the 6{X)- and ^(-^)- 
critical (5*'*"-values. That is, for t = 1, . . . , y/l we compute t-th pair 
{y^,y*) by first sampling a random starting point as described above 
(i.e., y* G {x~^ ,x~ ,x~'~} for some x sampled from [0,1]°' according to tt*^ 
and y* = x+ if 6*'^"-{x) = 5(x+'*"'), y* = x~ if 6*''^"-{x) = ^(x~'^"), and 
y* = x"'" otherwise). We then compute a neighbor y* G C'lT'^Cy*) and 
the maximum of the discrepancy of the S{X)- and (f(X)-critical points 
5*''^{y^) := max{5(y*'+'^'^),5(y*'-'""),5(y*'-'-'^")}. Finally, we sort the 
threshold values T{t) := -|(5*''^"'(y*) - 5*''^"(y*)|, t = 1,...,a/I in increas- 
ing order. This will be our threshold sequence. 

4.2 Shrinking Neighborhoods and Growing Number of 
Search Directions 

We add the concept of shrinking neighborhoods, i.e., we consider neighbor- 
hoods that decrease in size during the run of the algorithm. The intuition 
here is the following. In the beginning, we want the algorithm to make large 
jumps. This allows it to explore different regions of the search space. How- 
ever, towards the end of the algorithm we want it to become more local, 
allowing it to explore large parts of the local neighborhood. We implement 
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this idea by iteratively shrinking the /c-value. At the same time, we increase 
the mc-value, letting the algorithm explore the local neighborhood more 
thoroughly. 

More precisely, we do the following. In the beginning we set i := (n — 
l)/2 and mc := 2. That is, the algorithm is only allowed to change few 
coordinates of the current search point but at the same time it can make 
large jumps in these directions. Recall that k = 2i -\- 1. In the t-th iteration 
(out of a total number of I iterations) we then update 

n — 1 I — t t , t , , 

£ := — J h J and mc := 2 + j{d - 2) . 

For the computation of the threshold sequence, we equivalently scale k 
and mc by initializing i := (n — l)/2 and mc := 2 and then setting for the 
computation of the t-th pair 

_ nj2l . ^/Z^ + _L and mc := 2 + -^{d - 2) . 

Recall that we compute a total number of ^/l threshold values. 
4.3 Seperate Optimization of 6 and 6 

Our last modification is based on the intuition that the star discrepancy is 
either obtained by an open, subproportionally filled box (i.e., there exists 
a ?/ S f (X) such that d'^{X) = S{y)), in which case one might assume 
that there are many points y with large 5(y)-values. Alternatively, if the 
discrepancy is obtained by a closed, overproportionally filled box (i.e., there 
exists a y G r(X) such that d'^{X) = 6{y)), we assume that there are 
multiple such points y with large 5(y)-values. This intuition triggered us to 
test also the following split variant of the algorithm. 

In the 5-version of the algorithm, we only consider open test boxes. 
That is, whenever we want to sample a random starting point [a random 
neighbor], we proceed exactly as described in Section 4.1 but instead of 
computing both y+ and y~ (and, potentially y~'~) as well as the S{X)- and 
(5(X)-critical points y~^'^^, and y~'~''^"' in the notation of Section 4.1, 

we only compute [and y^''^"], and we initialize := y"*" [we update 
x'^ := y+ if and only if A5 = 6{y~^''^"') — 6[{x'^)~^'^^) > T, where T again 
denotes the current threshold]. 

The 5- version is symmetric. We compute both y~ and y~'~ as well 
as the 5(X)-critical points y~'*" and y~'~'''", and we initialize x'^ := y~ 
or x^ := y^'^ [we update x^ := y^ or x"^ := y^'^ if and only if A6 = 
max{5(y-'^'^),5(y-'-'^'^)} - J((x^)-'^") > T]. 

Note that only 5-values (or (5-values, respectively) are considered for the 
computation of the threshold sequence as well. 



13 



The algorithm is now the fohowing. We perform I iterations of the 5- 
version of the algorithm and / iterations of the 5-version. We then output 
the maximum value obtained in either one of the two versions. 

It should be noted that a large proportion of the computational cost of 
TA_improved lies in the snapping procedures. Thus, running / iterations 
of the (5-version followed by / iterations of the J-version has a comparable 
running time to running / iterations of an algorithm of the "mixed" form 
where we snap each point up and down to the 5{X)- and J(X)-critical grid 
points. Furthermore, as most modern CPUs are multicore and able to run 
several programs in parallel, the actual wall-clock cost of switching from 
TA_basic to the split version of TA_improved may be smaller still. 

Algorithm 1 summarizes TA_improved. Note that 5{n,d,X,I) is the 
equivalent of Algorithm 2 where we replace 6 by 6, by x~ etc. The 
same is true for Subroutine Thresholds(n, d, X, \/7, ^) for computing the 
threshold sequence for the (5-version. 



Algorithm 1: The algorithm TA.improved for computing lower 
bounds of the star discrepancy 

1 Input: 

2 Problem instance: n G N, d G N, sequence X = (x^)^^-^ in [0, l)'^. 

3 Number of iterations /. 

4 Computation of a lower bound for d^{X): 

5 6 := 5{n,d,X,I) /* Output of / iterations of the (5-version.*/ 

6 (5 := 6{n,d,X,I) /* Output of / iterations of the J-version.*/ 

7 Output: 6* := max{6,5}. 



4.4 Further Variants of the Algorithm 

We do not update x'^ with the critical points, since our experiments showed 
that the performance of the algorithm can be significantly improved by 
updating with the (simply) rounded, not necessarily critical points. This 
seems to allow the algorithm more flexibility and prevents it from getting 
stuck in a local optimum too early. 

We also tested a variant of the algorithm where we only update 
the best-so-far solution with 6*''''^{y) := max{5(y+''*"), 
but where all other decisions are only based on the value 6^{y) := 
m.SLK{6{y~^),6{y~),6{y~'~)}. That is, this algorithm does exactly the same 
as TA_basic but in addition computes the S{X)- and (5(X)-critical points and 
stores the largest values of 5*'^"'. Clearly, the performance (up to random 
noise) is better than the one of TA_basic at the cost of a higher running- 
time. However, it did not perform as well as the one described above where 
the decision of whether or not to update a point also depends on the S{X)- 
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Algorithm 2: The (5- version 5{n,d,X,I). 



1 Initialization: 

2 TS= {T{i))'^^ := Thresholds(n, d, X, VI, 6) /*Compute the 
threshold sequence of length Vl.*/ 

3 Sample the starting point: pick x € [0, l)'^ with respect to ir'^ and 
round x up to the nearest grid point x~^. Compute the 
5(X)-critical point x"*"'*" and S{x~^''^^). 

4 Initialize x^ := x~^, global := (5(a;+'*"), current := 6{x~^'^'^), 
£ ■= [Hziij, and mc := 2. 

5 for 1 = 1,..., \fl do 
Update threshold value T := T(i). for 
t = {i- \)V1 + 1, . . . , (i - \)VT + V7 do 

Update i := • ^ + jj and mc := 2 + [^\ {d - 2). 
Sample y G C]^^{x'^) as described in Section 3.1. 
Round y up to the nearest grid point y+ G f (X) and compute 
the 5(X)-critical point y~^'^"- as well as 5{y~^'^"'). 
if > GLOBAL then update global := (5(y+'^"). 

if A6 := (5(y+'*") - current > T then update x'^ := j/+ and 
CURRENT := 6{y~^'^'^). 
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Algorithm 3: Subroutine Thresholds(n, d, X, \fl, (5) for computing 
the threshold sequence. 

1 Initialization: 

2 Initialize i := [^^y^J, and mc := 2. 

3 for t = 1, . . . , vT do 

Update i := • ^ + and mc := 2 + [t/Vl\ {d - 2). 
Sample a random point: pick x G [0, l)'^ with respect to vr"' and 
round x up to the nearest grid point x"*". Compute the 
(^(X)-critical point x"'"''*'^ and (5(x+'^"). 

6 Sample y G C'™'^(x"'") as described in Section 3.1. 

7 Round y up to the nearest grid point y~^ G ^{X) and compute the 
(^(X)-critical point as well as 6{y~^'^''^). 
f{i) := 

9 Sort thresholds in increasing order to obtain threshold sequence 
{T{i))f^^ with T{i) < T{i + 1) for aU i G [n - 1]. 



and (^(X)-critical 5*'*"-values. 
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5 Theoretical Analysis 



From our main innovations, namely the non-uniform sampling strategy and 
the rounding procedures "snapping up" and "snapping down", we already 
analyzed the snapping procedures and proved that they enlarge the quality 
of our estimates. The analysis of the non- uniform sampling strategy is much 
more complicated. One reason is that our sampling strategy strongly inter- 
acts with the search heuristic threshold accepting. That is why we confine 
ourselves to the analysis of the pure non-uniform sampling strategy without 
considering threshold accepting. 

In Section 5.1 we prove that sampling in the d-dimensional unit cube 
with respect to the probability measure tt'^ instead of A"^ leads to superior 
discrepancy estimates. (More precisely, we restrict our analysis for technical 
reasons to the objective function 5.) In Section 5.2 we verify that for d = 1 
sampling with respect to the probability distribution induced on T(X) by 
sampling with respect to ir'^ in [0, l]'^ and then rounding to the grid T{X) 
leads to better discrepancy estimates than the uniform distribution on T{X). 
We comment also on the case d > 2. In Section 5.3 we prove that for 
random point sets X the probability of x G r(X) beeing a critical point 
is essentially an increasing function of the position of its coordinates Xj in 
the ordered sets Tj{X), j = 1, . . . ,d. Recall that critical points yield higher 
values of the local discrepancy function 6* and include the point that leads 
to its maximum value. Thus the analysis in Section 5.3 serves as another 
justification of choosing a probability measure on the neighborhoods which 
weights points with larger coordinates stronger than points with smaller 
coordinates. 

5.1 Analysis of Random Sampling with Respect to A'' and tt'^ 

Here we want to show that sampling in the d-dimensional unit cube with 
respect to the non-uniform probability measure vr'^ leads to superior results 
than sampling with respect to the Lebesgue measure A"^. 

Before we start with the theoretical analysis, let us give a strong indi- 
cation that our non-uniform sampling strategy is much more appropriate 
in higher dimension than a uniform sampling strategy. In [WF97] Winker 
and Fang chose in each of the dimensions d = 4, 5, 6 in a random manner 10 
lattice points sets, cf. also our Section 6. They calculated the discrepancy of 
these sets exactly. If r]d denotes the average value of the coordinates of the 
points y with 5*{y) = sup^g[o,i]d 6*{z), we get r/4 = 0.799743, % = 0.840825, 
and rjQ = 0.873523. The expected coordinate value /i^ of a point x, randomly 
sampled from [0, 1)'' with respect to the measure ir'^, is /U^ = d/{d + 1). So 
we get Hi = 0.8, /is = 0.83, and = 0.857143. Note that for using A*^ 
instead of tt'^ the expected coordinate value is only 0.5 for all dimensions. 
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5.1.1 Random Sampling in the Unit Cube with Respect to A*^ 

We analyze the setting, where we sample in [0, l]'^ with respect to A"^ to 
maximize the objective function 6. A similar analysis for 5 is technically 
more involved than the proof of Proposition 5.1. Furthermore, it leads to 
a less clear and also worse result. We comment on this at the end of this 
subsection. 

Proposition 5.1. Let e G (0,1), let n, d G N, and let X = (x*)"^]^ he a 
sequence in [0, l)'^. Let x* = x*{X) G [0, 1]*^ satisfy 6{x*) = sup^gjQ^^jd 6{x). 
Let us assume that Vx* > £■ Consider a random point r G [0, 1]'^, sampled 
with respect to the probability measure A*^. // = P^{X) denotes the 
probability of the event {r G [0, 1]"^ | 6{x*) — 6{r) < e}, then 

P^> -. (4) 

This lower bound is sharp in the sense that there exist sequences of point 
configurations {X^^^} such that Yiva.^^^ d\ e~'^ P^ {X^^^) converges to 1 as e 
tends to zero. 

Let additionally e G (0, 1) and i? G N. Consider random points 
r%...,r 

6^ := maxfL^5(r'). // 



1 ^-R g [0,1]"', sampled independently with respect to A*^, and put 



i?>|ln(e)||ln(l-^)| \ (5) 

then 6{x*) — 6^ < e with probability at least 1 — e. 

Notice, that the case Vx* < £ left out in Proposition 5.1 is less im- 
portant for us, since our main goal is to find a good lower bound for the 
star-discrepancy Indeed, the approximation of d'^{X) up to an ad- 

missible error e is a trivial task if < e. If d'^{X) > e, then Vx* < e 

implies 6{x*) < d'^{X), and the function 6 plays the significant role. 

Proof. For x < x* we get 

1 " 

Six) =Vx--Y^ l[o,x)(^') > S{x*) - (Vx* - Vx) . 

Therefore the Lebesgue measure of the set 

Ae{x*) := {x G [0, 1]"' I X < X*, Vx* -Vx<e} (6) 
is a lower bound for P^. Due to Proposition A. 2, we have for d > 2 

aV,(.-)) = 5;^E«'')(t|: 



17 



with positive coefficients hk{d). In particular, we have bQ{d) = 1. Thus, 



1 f'^ f'^ 

X* 

and this estimate is obviously also true for d = 1. Let us now consider for 
sufficiently large /c G N point configurations X^^^ = {x'^^^''^)'^^^, where 

xf^'^ = ... = xP'" = k/{k + l)>e (8) 

and x^'^ < k/{k + I) - e for ah i G [n], j > 1. Then obviously x*{X'^''^) = 

{k/{k + l), 1, . . . , 1), and it is easy to see that P^{X^^'^) = X'^{Ae{x*)). From 
Proposition A. 2 we get 

This proves that estimate (4) is sharp. Notice that we assumed (8) only for 
simplicity. Since 5{x*{X)) is continuous in X, we can find for fixed k an 
open set of point configurations doing essentially the same job as X^'^^ 

Assume now 6{x*) — 6^ > e, i.e., 6{x*) — S{r^) > e for all i < R. The 
probability of this event is at maximum (1 — e'^/dl)^ . This probability is 
bounded from above by e if i? satisfies (5). □ 

For d > 1 and e < 1/2 we have | ln(l - e'^/d\)\'^ ~ d\e~'^. In this 
case we can only assure that 6^ is an e-approximation of sup^^^Q i^d 6{x) 
with a certain probability if the number R of randomly sampled points is 
super-exponential in d. 

Let us end this section with some comments on the setting where we 
are only interested in maximizing 6. If for given e > 0, X e [0, l)""' the 
maximum of 6 is achieved in x = x{X) G [0, 1]"^, and if we want to know the 
probability of the event {r G [0, l]'^ \6{x) — 6{r) < e}, there seems to be no 
alternative to estimating X'^{U{x)), where 

U{x) := {r £[0,lf\x<r, V,. - < e} . 

It is easy to see that X'^{U{x{X))) approaches zero if one of the coordinates 
of X tends to 1 - regardless of e and Vx ■ We omit a tedious error analysis to 
cover the (5-setting. 

5.1.2 Random Sampling in the Unit Cube with Respect to ir'^ 

Similarly as in the preceding section, we analyze here the setting where, in 
order to maximize S, we sample in [0, 1]'' with respect to vr'^. 
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Proposition 5.2. Let e,d,n,X = (x*)"^^ and x* as in Proposition 5.1. 
Again assume Vx* > e. Consider a random point r € [0, 1]°', sampled with 
respect to the probability measure tt'^. If = P^{X) denotes the probability 
of the event {r G [0, 1]"^ | S{x*) — 6{r) < e}, then P^ > e*^. This lower bound 
is sharp, since there exists a point configuration X such that P^{X) = e"^ . 

Let additionally e € (0, 1) and R G N. Consider random points 
r^,...,r^ G [0,1]"', sampled independently with respect to tt'^, and put 
6^ := max^i5(r*). // 

R>\He)\\Hl-e'')\-\ (9) 
then 6{x*) — 6^ < e with probability at least 1 — e. 

Proof. Clearly P^ > it'^{Ai;{x*)), where ^^(x*) is defined as in (6). Due 
to Proposition A. 5 we have ir'^{Ai;{x*)) > e'^. Let us now consider the 
point configuration X, where x\ = ... = x" = e and x*- < e for all i G 
[n], j > 1. Furthermore, at least an e~^-fraction of the points should be 
equal to (e, 0, . . . , 0). Then obviously x*{X) = (e, 1, . . . , 1) and P^{X) = 
7t'^{A,{x*)) = e'^. 

Let us now assume that 5{x*) — 6^ > e, i.e., 6{x*) — 6{ri) > e for all 
i < R. This happens with probability not larger than (1 — e*^)^. Therefore 
we have (1 - e'^)^ < e if i? satisfies (9). □ 

If d > 1 and e < 1/2, then | ln(l — e'^)!"^ ~ e~'^. Here the number of iter- 
ations R ensuring with a certain probability that 6^ is an e-approximation 
of sup{(5(x) I X G [0, 1]"^} is still exponential in d, but at least not super- 
exponential as in the previous section. 

Altogether we see that a simple sampling algorithm relying on the prob- 
abilistic measure tt'^ rather than on A'^ is more likely to find larger values of 
6. 

5.2 Analysis of Rounding to the Coordinate Grid 

As described in Sections 3.1 and 3.2, our non-uniform sampling strategy 
on the grids T{X) and T(X) for the objective functions 6 and 5 consists of 
sampling in [0, 1]"' with respect to vr'^ and then rounding the sampled point 
y up and down to grid points y^ and y~ , respectively. This induces dis- 
crete probability distributions Wu = {'Wu{z))^^y'{x) — i''^ii^))z&{x) 
on T{X) and T{X), respectively. If we use additionally the rounding pro- 
cedures "snapping up" and "snapping down", as described in Section 4.1, 
this will lead to modified probabilistic distributions wf[^ = {'w^{z))z^f'(^x) 
and wf^ = {wf'^{z))z^Y'(^x) and T{X), respectively. In dimension 
d = 1 the probability distributions Wu and as well as wi and iyj*" are 
equal, since every test box is a critical one. Essentially we prove in the 
next section that in the one-dimensional case sampling with respect to the 
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probability distributions Wu = w^" [wi = wf"] leads to larger values of 6 [5] 
than sampling with respect to the uniform distribution on T{X) \r{X)]. 

5.2.1 Analysis of the 1-Dimensional Situation 

Recall that in the 1-dimensional case vr = vr^ coincides with A = A^. 

To analyze the 1-dimensional situation, let X := (x*)"^^ be the given 
point configuration in [0,1). Without loss of generality we assume that 
< < • • • < x" < 1. Since 6*{1) = we do not need to consider the 
whole grid T{X) but can restrict ourselves to the set T{X) = {x^, . . . ,x^}. 
For the same reason, let us set := x^ if y > x^ (recall that, following the 
description given in Section 3.1, we set y~ := x" for y < x^ anyhow). 

As discussed above, we take points randomly from T{X), but instead of 
using equal probability weights on r(A), we use the probability distributions 
Wu = wfj^ and wi = w^"^ on T{X) to maximize our objective functions S 
and 6, respectively. If we put x^ := x" — 1 and x"+^ := x^ + 1, then the 
corresponding probability weights for 6 and 6 are given by wi{x'^) := x* — x*~^ 
and Wu{x^) '■= x*"*"^ — x*, respectively. 

In the next lemma we will prove the following statements rigorously: If 
one wants to sample a point r € r(A) with 6{t) as large as possible or if 
one wants to enlarge the chances to sample the point r where 5 takes its 
maximum, its preferable to use the weights wi instead of the equal weights 
1/n on r(A). Similarly, it is preferable to employ the weights Wu{x^), i = 
1, . . . , n, instead of equal weights if one wants to increase the expectation of 
6 or the chances of sampling the maximum of 6. 

Lemma 5.3. Let d = I and r, r G r(X) with 5{t) = sup^gjg i] ^i^) ^"^d 
5{f) = sup2g[o^i] 5{z). Then we have wi{t) > 1/n and Wu{f) > 1/n. 

Furthermore, let E, E;, and E„ denote the expectations with respect to 
the uniform weights {1/n}, the weights {wi{x^)}, and the weights {i«„(x*)} 
on the probability space T{X), respectively. Then E;((5) > E((5) and E^(J) > 
E{5). 

Proof. Let u € [n] with r = x'^. Assume first wi{x'^) < 1/n, i.e., x"^ — x"^""^ < 
1/n. 11 u > 1, then 

5(x^-i) = x^^-i - > - = Six"") . 

n n 

11 V = 1, then however 

5(x") = x" - =x^ + ->x^ = dix") . 

n n 

So both cases result in a contradiction. 

We prove now E;(5) > K{6) by induction over the cardinality n of X. 
For ra = 1 we trivially have E;(5) = X"*^ = K{S). 
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Therefore, let the statement be true for n and consider an ordered set 
T{X) := {x^ , . . . , x'^'^^} . Let 6 achieve its maximum in x"^ G We 
aheady proved that wi{x'^) = x'^ — x'^~^ > l/(n + 1) holds. With the 
notation 

X* := X* if 1 < i < u, X* := x*"*"^ — if i> u, 

n + 1 



and 



n 

we get 



X := X , z = l,...,n, and x := x — 1 



ri+l '^'i / 1 \ ""''"'^ 

= = + (wiixn - -—jSix'') + y^wi{x')6{x') 

n+1 V n+1/ ^-^ 

n + lV n + 1/ ^-^ 



1=1 



i=2 



+ x^ - x" + 5 + > ^ - 5'"' U 



n + 1 V n + 1/ V n + 1 



On the other hand we have 

E(5) = y -^K^') = ^ + V ^ f x^ - ^ 

^n + l ^ ' n + 1 ^n + lV n + 1 

i=l j=l 



(5(X^) , / ^J2^ 1 i - 1 

^ i=l 



n + 1 Vn + l/ n V n 



These calculations and our induction hypothesis, applied to {x^,...,x"}, 
lead to ¥,i{d) > K{6). For /x € [n] with f = x^ the inequalities Wu{x^) > 1/n 
and > E(^) can be proved in a similar manner. □ 



5.2.2 Analysis of Higher Dimensional Situations d>2 

If we turn to the situation in dimension d > 2, we first face a technical 
difference: For many sequences X the supremum sup2,g[o,i]<i ^{x) will be 
achieved by some x E f(X) with Xj = 1 ^ {xj,...,x"} for at least one 
j € [c?]. Therefore, we typically have to consider the whole grid T{X) if we 
want to maximize 5. 

Let us now have a look at the weight functions wi^ Wu induced by our 
algorithms in dimension d >2. Actually, here we only consider the simpler 



21 





„.2 
X 




x^ 


1 


1 1 


1 — a 


1 




i 

7 


) — a 


X 






— t 


x^ 



Figure 1: 6 takes its maximum in x^, but wi{x'^) < 1/36 

Lebesgue measure A"^, but it is obvious how to modify the definitions and 
arguments below to cover the vr'^-setting. 

For all j G [d] let Uj := \ fj{X)\. As in Section 2.2 let (/>j : [i^j] fj{X) be 
the ordering of the set Tj{X). Set (j)j{0) := 0. For y = {cj)i{ii), . . . ,(j)diid)) £ 
f (X), ii, . . . £ [uj], we define the weight 'Wi{(j)) by 

d 

My) ■= n ('^j(^j) ~ "^J^^J ~ • 

For the definition of the weights Wu{y) let = for i G [z^j — 1] and let 
hi^j) ■= + For y = (<^i(n), • • -Adiid)) G r(X), zi,. . . G [i^j-l], 
let 

d 

Wu{y) ■■= n + 1) - • 

i=i 

Let y G r(X) and y G r(X). Then the weights wi{y) and Wu{y) are ob- 
viously the probabilities that after sampling a point z in [0, l]'^ with respect 
to A*^, we end up with = y and = y, respectively. 

The simple examples in Figures 1 and 2 with d = 2 and n = 5 illustrate 
that we cannot prove the extension of the weight inequalities from Lemma 
5.3 in d > 2. More precisely: If r G r(X) and r G r(X) are points with 
6{t) = supyg[Q ]^]d (5(y) and 6{f) = sup^^gjo,!]'' then in general we do not 
have 

d ^ d ^ 

Mr) > n - Mr) > n ^^-TT ' (10) 
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Figure 2: 5 takes its maximum in x^, but < 1/36 

even wi{t) or Wu{f) > (n+l)"'^ need not to be satisfied. Notice that for fixed 
d and n the sets of counterexamples to each of the inequahties in (10) have 
strictly positive measure. Indeed, the suprema of 5 and 6 are continuous in 
X, and the weights wi, Wu are continuous in X as long as for each j G [d\ 
all coordinates xj, . . . ,x^ are distinct. But it is, e.g., easy to rearrange the 
counterexamples in Figure 1 and 2 slightly such that these constraints are 
satisfied. 

But notice also that in both figures the grid point 7 provides a good 
approximation of the actual maximum of 5 and 5, and 7 has a rather large 
weight wi and Wu, respectively. Furthermore, the point sets in Figure 1 and 2 
are quite artificial and also far away from being well-distributed. For random 
or low-discrepancy sets these probability weights can still be very useful - 
this is also confirmed by our numerical experiments, see Section 6. Moreover, 
the invalidity of (10) does not necessarily mean that the expectations 1E((5) 
and E((5) are larger than Ei{5) and E„((5), respectively. 

Actually, if we use additionally the procedures "snapping up" and "snap- 
ping down" and look at the induced probability distribution wf^ and w^, 
respectively, then has the maximum weights wf^ of all critical points in 
Figure 1, and has the maximum weight wfP' of all critical points in Figure 
2. Thus adding the snapping procedures can change the situation decisively. 

Additional theoretical results in this direction would be interesting. 
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5.3 The Chances of a Grid Point Beeing a Critical Point 

If X = (x^, . . . , x") is a sequence in [0, l)"^ which has been chosen randomly 
with respect to the Lebesgue measure and if x G r(X), then the larger the 
components of x, the higher is the probability of x being a (5(X)-critical 
point. The same holds for x S r(X) and S{X), respectively. 

Proposition 5.4. Consider [0, 1)"*^ as a probability space endowed with 
the probability measure A"*^. Let l := {ii, . . . ,id) G [n + l]'^. If k indices 
i^(^i), . . . , i,^(k) of the ij, j = 1, . . . ,d, are at most n and the remaining d — k 
of them are equal to n + 1, then for uniformly distributed random variable 
X in [0, 1)"'^ the multi-index l is 6{X)-critical with probability 

(^)"'n(n-»fvo)-Ao)). 

Proof. Let ^> = {(pi, . . . ^cpd) be as in Section 4.1. Since the event that for 
all coordinates j G [d] we have |rj(X)| = n + 1 holds with probability 1, we 
restrict ourselves to this situation. 

Without loss of generality, we may assume that ii, ■ ■ ■ ,ik ^ n and ik+i = 
. . . = irf = n + 1. Obviously, S'fc+i($(t)), . . . , 5rf(<l>(i)) are (5-critical surfaces, 
since 4>k+i{n + 1) = . . . = (pdi^ + 1) = 1. For i = 1, . . . , A: let cij = cri{X) : 
[n] — )• [n] be the permutation with 

X^^ '< X-'^ '<■■■< X^^ ' <l. 

Clearly, = (pj{ij) = x"/^''^ for aU j G [k]. Since Si{x) n Sj{x) = for 

all i / j and ah x G [0, l]'^, the surfaces 5'i(^>(i)), . . . , Sk{^{i)) can only be 
(5(X)-critical if \{ai{ii), . . . ,ak{ik)}\ = k. More precisely, i is a (5(X)-critical 
multi-index if and only if the condition 

Vi G [A:] V/ G [k] \ {j} : xp^'^'^ < $(0/ = 
holds. This is equivalent to the k conditions 

a];^{a2{i2)) ,o-i^{<y3{i3)) , o-f ^(crfc(ifc)) < ii , 

'7^^(cri(n)) ,cr^^(<T2(i2)) ,. . . ,0-^^(o-fc-l(«fc-l)) < ik ■ 

Since all the components x*-, i G [n], j G [d], of X are independent ran- 
dom variables, we have that for a fixed index u G [d] each permutation 
r : [n] — 7- [n] is equally likely to fulfill cJi/(X) = r. Thus, the probability of 
L being a (5(X)-critical index is just the number of /c-tuples (fii, . . . ,ak) of 
permutations fulfilling (11), divided by (n!)^. 
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For given pairwise distinct values ai{ii), . . . , (Tk{ik) the j-th. condition of 
(11) is satisfied by {{ij — 1) . . . {ij — [k — l)))(n — A;)! permutations <Tj. Since 
all k conditions in (11) can be solved independently of each other, it is now 
easy to deduce the statement of the Proposition. □ 

To state the corresponding proposition for 5, we have to introduce Stir- 
ling numbers of second kind S{d, k). For /c G N, /c < d let S{d, k) denote the 
number of partitions of [c?] into k non-empty subsets. A closed formula for 
S{d, k) is 

Sid k)-s-t}n!i^ 

This formula and other useful identities can, e.g., be found in [Rio58]. 

Proposition 5.5. Let X be a uniformly distributed random variable in 
[0,1)"'^. Let i = (ii,...,irf) G [nY. Then i is a 5{X)-critical multi-index 
with probability 

f;s(<i,.)((!i^)"'n(n(i,-.)). 

Proof. We just need to consider the case where the almost-sure event 
\rj{X)\ = n for all j G [d] holds. For j = 1, . . . , d let (Tj := aj{X) : [n] [ii] 
be the permutation with 

<Ji(l) cr,(2) o"i(ra) 

X-'^ ' < Xj'^ '<■■■< X-'^ ' <l. 

Then $(0j = (l)j{ij) = x"/^''^ for ah j G [d]. It is easy to see that the surface 
5j($(i)) is (5(X)-critical if and only if the condition 

Vj G [d] V/ G [d] \ {j} : xp^'^'^ < = xf'^^ 

is satisfied. This can be rewritten as 

<yi^{<y2{i2)) ,cri^{crs{is)) ,. . . , a^'^iadiid)) < n, 
a^^(cri(ii)) ,cr^^(cr3(i3)) ,. . . , ^ (<Trf(id)) < Z2 , 

0-rf^(o-l(il)) ,cr^^(cr2(i2)) V • • ,0"^ H-^d-l («d-l)) < id ■ 

If |{(Ti(ii), . . . ,o"rf(id)}| = k, then there exist 

d k-l 

S{d,k)n\{{n-kyY-'l[ll{ij-u) 



(12) 



j=l u=l 



permutations satisfying (12). With this observation and the fact that all 
components x*-, i G [n], j G [d], of X are stochastically independent, it is 
now easy to deduce the statement of Proposition 5.5. □ 
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6 Experimental Results 



We now present the experimental evaluations of the algorithms. We will 
compare our basic and improved algorithms, TA_basic and TA_improved, 
against the algorithm of Winker and Fang [WF97], and also give a brief 
comparison against the genetic algorithm of Shah [ShalO] and the integer 
programming-based algorithm of Thiemard [ThiOlb]. 

6.1 Experimental Setup 

We divide our experiments into a thorough comparison against the algorithm 
of Winker and Fang [WF97], given in Section 6.3, and more brief compar- 
isons against the algorithms of Shah [ShalO] and Thiemard [ThiOlb], in Sec- 
tions 6.4 and 6.5, respectively. The algorithms TA_basic and TA_improved, 
as well as the algorithm of Winker and Fang [WF97], were implemented 
by the authors in the C programming language, based on the code used 
in [Win07]. All implementations were done with equal care. In the case of 
Winker and Fang [WF97], while we did have access to the original Fortran 
source code (thanks to P. Winker), due to lack of compatible libraries we 
could not use it, and were forced to do a re-implementation. 

For the integer programming-based algorithm of Thiemard [ThiOlb], 
E. Thiemard has kindly provided us use of the source code. This source 
code was modified only as far as necessary for compatibility with newer 
software versions ~ specifically, we use version 11 of the CPLEX integer 
programming package, while the code of Thiemard was written for an older 
version. Finally, Shah has provided us with the application used in the ex- 
periments of [ShalO], but as this application is hard-coded to use certain 
types of point sets only, we restrict ourselves to comparing with the exper- 
imental data published in [ShalO]. Random numbers were generated using 
the Gnu C library pseudorandom number generator. 

The instances used in the experiments are described in Section 6.2. For 
some instances, we are able to compute the exact discrepancy values either 
using an implementation of the algorithm of Dobkin et al. [DEM96], avail- 
able from the third author's homepage^, or via the integer programming- 
based algorithm of Thiemard [ThiOlb]. These algorithms both have far 
better time dependency than that of Bundschuh and Zhu [BZ93], allowing 
us to report exact data for larger instances than previously done. For those 
instances where this is too costly, we report instead the largest discrep- 
ancy value found by any algorithm in any trial; these imprecise values are 
marked by a star. Note that this includes some trials with other (more time- 
consuming) parameter settings than those of our published experiments; 
thus sometimes, none of the reported algorithms are able to match the ap- 
proximate max value. 

^Found at http://www.mpi-inf.mpg.de/~wahl/. 
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As parameter settings for the neighbourhood for TA_basic, we use I = 
[^J if n > 100, and i = [^J otherwise, and mc = 2 throughout. These 
settings showed reasonable performance in our experiments and in [Win07]. 
For TA_improved, these parameters are handled by the scahng described in 
Section 4.2. 

Throughout, for our algorithms and for the Winker and Fang algo- 
rithm, we estimate the expected outcome of running 10 independent tri- 
als of 100,000 iterations each and returning the largest discrepancy value 
found, and call this the best- of- 10 value. The estimation is computed from 
a basis of 100 independent trials, as suggested by Johnson [Joh02], which 
strongly decreases irregularities due to randomness compared to the method 
of taking 10 independent best-of-10 values and averaging these. The com- 
parisons are based on a fix number of iterations, rather than equal running 
times, as the point of this paper is to compare the strengths of the involved 
concepts and ideas, rather than implementation tweaks. For this purpose, 
using a re-implementation rather than the original algorithm of Winker and 
Fang [WF97] has the advantage that all algorithms compared use the same 
code base, compiler, and libraries, including the choice of pseudo-random 
number generator. This further removes differences that are not interesting 
to us. 

6.2 Instances 

Our point sets are of four types: Halton sequences [Hal60], Faure se- 
quences [Fau82], Sobol' point sets [Sob67], and so-called Good Lattice Points 
(GLP), described below. The Halton sequences and GLPs were generated 
by programs written by the authors, the Faure sequences by a program of 
John Burkardt [Bur], and the Sobol' sequences using the data and code of 
Stephen Joe and Frances Kuo [JK08, KuolO]. 

Winker and Fang tested their algorithm for several point sets in dimen- 
sion d = 4, 5, . . . , 11, which were constructed in the following manner: Let 
(n, /ii, . . . , hd) G N'^+^ with < /ii < /12 < ■ ■ ■ < hd < n, where at least 
one hi is relatively prime with n, i.e., their greatest common divisor is one. 
Then the points x^, . . . , x" E [0, l)'^ are given by 

where {x} denotes the fractional part of x G M, i.e., {x} = x — [x\. Winker 
and Fang call {x^, . . . , x"} a good lattice point (GLP) sef^ of the generating 
vector {n,hi, . . . ,hfi)- It is known that for any d > 2 and n > 2 there 
exists a generating vector such that the corresponding GLP set exhibits 
asymptotically a discrepancy of 0(log(n)'^/n), where the implicit constant 
of the big-O-notation depends solely on d, see, e.g., [Nie92, Sect. 5.2]. 

^Other authors call it GLP set if it additionally exhibits a small discrepancy. 
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Algorithm 



Smaller instance Larger instance 
d = 10, n = 100 d = 20, n = 1000 



TA_basic 

TA_improved, S only 
TA_improved, 5 only 
TA_improved, mixed form 
Winker & Fang 



0.78s 9.34s 

1.22s 10.94s 

0.85s 9.11s 

1.87s 20.37s 

0.61s 7.2s 



Table 1: Running times for the considered algorithms. All algorithms executed one trial 
of 100,000 iterations. The inputs are two randomly generated point sets. 

Winker and Fang considered two series of examples to test their al- 
gorithm: First, they (randomly) generated in each dimension d = 4,5,6 
ten GLP n-point sets, where n S {50, 51, . . . , 500} for d = 4, n E 
{50, 51, ... , 250} for d = 5 and n G {25, 26, ... , 100} for d = 6. For each 
GLP set the exact discrepancy was calculated with an implementation of 
the algorithm of Bundschuh and Zhu [BZ93]. 

Secondly, they considered six GLP sets in dimension d = 6, 7, ...,11 
with cardinality between 2129 and 4661 points and performed 20 trials with 
200, 000 iterations for each of the six sets. Solving these instances exactly 
is mostly intractable, even with the algorithm of Dobkin et al. [DEM96]. 
Therefore, with the exception of the smallest instance, it cannot be said if 
the results of this second series of examples are good approximations of the 
real discrepancy of the GLP sets under consideration or not. 

6.3 Comparisons against the Algorithm of Winker and Fang 

To begin the comparisons, an indication of the running times of the al- 
gorithms is given in Table 1. As can be seen from the table, TA_basic 
takes slightly more time than our implementation of Winker and Fang, and 
TA_improved takes between two and three times as long, mainly due to the 
snapping procedures. For TA_improved, we report the separate times for S 
and S optimization, as well as the time required for a mixed optimization of 
both (as is done in TA_basic). As can be seen, the overhead due to splitting 
is negligible to non-existent. 

The parameter settings for our implementation of the algorithm of 
Winker and Fang are as follows. Since our experiments did not reveal a 
strong influence of the choice of a on the quality of the algorithm, we fix 
a := 0.995 for our experiments. Winker and Fang do not explicitly give 
a rule how one should choose k and mc. For the small-dimensional data 
(Table 2), we use the settings of [WF97]. For the other tests, we use mc = 3 
if d < 12 and mc = 4 otherwise, and A; = 41 if n < 500 and k = 301 other- 
wise. This seems to be in line with the choices of Winker and Fang for the 
sizes used. 

Table 2 shows the data for the GLP sets used by Winker and Fang 
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Table 2: Data for GLP sets used by Winker and Fang [WF97]. Discrepancy values 
marked with a star are lower bounds only (i.e., largest discrepancy found over all executions 
of algorithm variants). All data is computed using 100 trials of 100, 000 iterations; reported 
is the average value of best-of-10 calls, and number of times (out of 100) that the optimum 
(or a value matching the largest known value) was found. The data for Winker and Fang 
is for our re-implementation of the algorithm; the original results for the same instances 
can be found in [WF97]. 
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in [WF97]. Although the last group of point sets are quite large, note that 
this data is mostly of modest dimension. As can be seen, for these sizes, 
all algorithms behave reasonably, with both of our algorithms generally out- 
performing our implementation of Winker and Fang, and with TA_improved 
showing much higher precision than TA_basic. 

We note that our re-implementation of the Winker and Fang algorithm 
gives notably worse results than what was reported in [WF97] for the same 
instances. For the larger instances (i.e., with thousands of points), 200,000 
iterations are used in [WF97] while we use 100,000 iterations throughout, 
but there is also a clear difference for the smaller settings. Adjusting the pa- 
rameters of our implementation to match those used in [WF97] has not been 
found to compensate for this. After significant experimentation, the best hy- 
pothesis we can provide is that there might be a difference in the behavior 
of the pseudo-random number generators used (in particular, as [WF97] 
uses a random number library we do not have access to). Still, even com- 
pared to the results reported in [WF97], our algorithms, and TA.improved 
in particular, still fare well. 

Table 3 shows the new data, for larger-scale instances. A few new trends 
are noticeable, in particular for the higher-dimensional data. Here, the 
algorithm of Winker and Fang seems to deteriorate, and there is also a larger 
difference emerging between TA_basic and TA_improved, in particular for 
the Sobol' sets. However, as can be seen for the 2048-point, 20-dimensional 
Sobol' set, it does happen that the lower bound is quite imprecise. (The 
value of 0.0724 for this point set was discovered only a handful of times over 
nearly 5000 trials of algorithm variants and settings.) 

The highest-dimensional sets (d = 50) illustrate the deterioration of 
Winker and Fang with increasing dimension; for many of the settings, the 
largest error this algorithm finds is exactly 1 jn (due to the zero- volume box 
containing the origin with one point). 

6.4 Comparisons with the Algorithm by Shah 

Table 4 lists the point sets used by Shah [ShalO]. The Faure sets here are 
somewhat nonstandard in that they exclude the origin, i.e., they consist of 
points 2 through ??, + 1 of the Faure sequence, where the order of the points 
is as produced by the program of Burkhardt [Bur] . 

Some very small point sets were omitted, as every reported algorithm 
would find the optimum every time. For all but one of the point sets, the 
exact discrepancy could be computed; the remaining instance is the first 500 
points of the 10-dimensional Faure sequence. 

Most of the sets seem too easy to really test the algorithms, i.e., all 
variants frequently find essentially optimal points. The one exception is the 
last item, which shows a clear advantage for our algorithms. We also find 
(again) that TA_improved has a better precision than the other algorithms. 
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d5o(') TA_basic TA_improved Winker & Fang 



Name 


d 


n 


found 


Hits 


Best-of-10 


Hits 


Best-of-10 


Hits 


Best-of-10 


Sobol' 


7 


256 


0.0883 


1 


0.0804 


78 


0.0883 





0.0819 


Sobol' 


7 


512 


0.0452 


1 


0.0440 


17 


0.0451 





0.0395 


Sobol' 


8 


128 


0.1202 





0.1198 


98 


0.1202 





0.1102 


Sobol' 


9 


128 


0.1372 


8 


0.1367 


100 


0.1372 





0.1254 


Sobol' 


10 


128 


0.1787 


36 


0.1787 


100 


0.1787 





0.1606 


Sobol' 


11 


128 


0.1811 


14 


0.1811 


97 


0.1811 





0.1563 


Sobol' 


12 


128 


0.1885 


1 


0.1873 


82 


0.1885 





0.1689 


Sobol' 


12 


256 


0.1110* 


2 


0.1108 


41 


0.1110 





0.0908 


Faure 


7 


343 


0.1298 


21 


0.1297 


100 


0.1298 





0.1143 


Faure 


8 


121 


0.1702 


99 


0.1702 


100 


0.1702 





0.1573 


Faure 


9 


121 


0.2121 


98 


0.2121 


100 


0.2121 





0.1959 


Faure 


10 


121 


0.2574 


95 


0.2574 


100 


0.2574 





0.2356 


Faure 


11 


121 


0.3010 


100 


0.3010 


100 


0.3010 





0.2632 


Faure 


12 


169 


0.2718 


73 


0.2718 


100 


0.2718 





0.1708 


GLP 


6 


343 


0.0870 


1 


0.0869 


36 


0.0870 





0.0778 


GLP 


7 


343 


0.0888 


3 


0.0883 


28 


0.0888 





0.0791 


GLP 


8 


113 


0.1422 


6 


0.1399 


95 


0.1422 





0.1303 


GLP 


9 


113 


0.1641 


98 


0.1641 


100 


0.1641 





0.1490 


GLP 


10 


113 


0.1871 


1 


0.1862 


94 


0.1871 





0.1744 


Sobol' 


20 


128 


0.2616* 





0.2576 


51 


0.2616 





0.0497 


Sobol' 


20 


256 


0.1856* 


13 


0.1854 


49 


0.1856 





0.0980 


Sobol' 


20 


512 


0.1336* 





0.1080 


86 


0.1336 





0.0635 


Sobol' 


20 


1024 


0.1349* 





0.0951 





0.1330 





0.0560 


Sobol' 


20 


2048 


0.0724* 





0.0465 





0.0505 





0.0370 


Faure 


20 


529 


0.2615* 





0.2587 


98 


0.2615 





0.0275 


Faure 


20 


1500 


0.0740* 





0.0733 


14 


0.0740 





0.0347 


GLP 


20 


149 


0.2581* 


1 


0.2548 


65 


0.2581 





0.0837 


GLP 


20 


227 


0.1902* 





0.1897 


1 


0.1899 





0.0601 


GLP 


20 


457 


0.1298* 





0.1220 


3 


0.1272 





0.0519 


GLP 


20 


911 


0.1013* 





0.0975 


8 


0.1013 





0.0315 


GLP 


20 


1619 


0.0844* 





0.0809 


2 


0.0844 





0.0299 


Sobol' 


50 


2000 


0.1030* 





0.0952 





0.1024 





0.0005 


Sobol' 


50 


4000 


0.0677* 





0.0597 





0.0665 





0.00025 


Faure 


50 


2000 


0.3112* 





0.2868 


100 


0.3112 





0.0123 


Faure 


50 


4000 


0.1979* 





0.1912 





0.1978 





0.0059 


GLP 


50 


2000 


0.1465* 





0.1317 





0.1450 





0.0005 


GLP 


50 


4000 


0.1205* 





0.1053 





0.1201 





0.0003 



Table 3: New instance comparisons. Discrepancy values marked with a star are lower 
bounds only (i.e., largest discrepancy found over all executions of algorithm variants). All 
data is computed using 100 trials of 100, 000 iterations; reported is the average value of 
best-of-10 calls, and number of times (out of 100) that the optimum (or a value matching 
the largest known value) was found. 
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TA_basic TA_improved Shah 



Class 


d 


n 


di>(-) 


Hits 


Bcst-of-10 


Hits 


Best-of-10 


Hits 


Best Found 


Halton 


5 


50 


0.1886 


100 


0.1886 


100 


0.1886 


81 


0.1886 


Halton 


7 


50 


0.2678 


100 


0.2678 


100 


0.2678 


22 


0.2678 


Halton 


7 


100 


0.1714 


9 


0.1710 


100 


0.1714 


13 


0.1714 


Halton 


7 


1000 


0.0430 





0.0424 


81 


0.0430 


gd) 


0.0430'^^ 


Faure 


10 


50 


0.4680 


100 


0.4680 


100 


0.4680 


97 


0.4680 


Faure 


10 


100 


0.2483 


52 


0.2483 


100 


0.2483 


28 


0.2483 


Faure 


10 


500 


0.0717* 


2 


0.0701 


100 


0.0717 


0(1) 


0.0689'^' 



Table 4: Comparison against point sets used by Shah. Reporting average value of best- 
of-10 calls, and number of times (out of 100) that the optimum was found; for Shah, 
reporting highest value found, and number of times (out of 100) this value was produced. 
The discrepancy value marked with a star is lower bound only (i.e., largest value found by 
any algorithm). Values marked (1) are recomputed using the same settings as in [ShalO]. 





TA_improved 


Thiemard: 


Initial 


Same time. 


Same result. 


Instance 


Time 


Result 


Time 


Result 


result 


time 


Faure- 12- 169 


25s 


0.2718 


Is 


0.2718 


0.2718 


Is 


Sobor-12-128 


20s 


0.1885 


Is 


0.1463 


0.1463 


453s (7.6m) 


Sobol'-12-256 


35s 


0.1110 


3s 


0.0872 


0.0873 


1.6 days 


Faure-20-1500 


280s (4.7m) 


0.0740 


422s (7m) 


0.0732 


None 


> 4 days 


GLP-20-1619 


310s (5.2m) 


0.0844 


564s (9.4m) 


0.0572 


None 


> 5 days 


Sobor-50-4000 


2600s (42m) 


0.0665 


32751s (9h) 


0.0743 


None 


32751s (9h) 


GLP-50-4000 


2500s (42m) 


0.1201 


31046s (8.6h) 


0.0301 


None 


> 5 days 



Table 5: Comparison against the integer programming-based algorithm of 
Thiemard [ThiOlb]. The values for TA_improved represent the time and av- 
erage result of a best-of-10 computation with 100, 000 iterations per trial. 
The middle pair of columns give the time required for [ThiOlb] to re- 
turn a first output, and the value of this output; the last two columns 
report the lower bound reached by [ThiOlb] if allocated the same time that 
TA_improved needs for completion, and the time required by [ThiOlb] to 
match the result of TA_iniproved. 

6.5 Comparisons with the Algorithms by Thiemard 

Finally, we give a quick comparison against the integer programming-based 
algorithm of Thiemard [ThiOlb]. Since [ThiOlb] has the feature that running 
it for a longer time produces gradually stronger bounds, we report three 
different checkpoint values; see Table 5 for details. The results are somewhat 
irregular; however, [ThiOlb] may require a lot of time to report a first value, 
and frequently will not improve significantly on this initial lower bound 
except after very large amounts of computation time (for example, for the 12- 
dimensional, 256-point Sobol' set, the value 0.0872 is discovered in seconds, 
while the first real improvement takes over an hour to produce). 

Thiemard also constructed a second algorithm for discrepancy esti- 
mation, based on delta-covers [ThiOla]; this is freely downloadable from 
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Thiemard's homepage. Its prime feature is that it provides upper bounds 
with a non-trivial running time guarantee. The lower bounds that it 
produces are not as helpful as the upper bounds, e.g., it was reported 
in [DGWIO] and [ShalO] that the lower bounds from the preliminary version 
of TA_basic [Win07] and the genetic algorithm of Shah [ShalO] are better. 
Thus we omit this kind of comparison here. 

7 Conclusion 

Our numerical experiments clearly indicate that the improvements made 
from the algorithm of Winker and Fang in TA_basic and TA_improved 
greatly increases the quality of the lower bounds, in particular for the dif- 
ficult higher-dimensional problem instances. Nevertheless, we do not fully 
understand the behavior of different algorithm variants with regards to snap- 
ping. In particular, one might well have expected the variant described in 
Section 4.4 to do better than the one of Section 4.1 that we currently use. 
It is still possible that a judicious application of the "snap-move" variant 
of Section 4.4, perhaps only in certain situations, can improve the behavior 
further. 

Still, all in all, we conclude that the algorithms TA_basic and 
TA_improved presented in the current work represent significant improve- 
ments over previous lower-bound heuristics for computing the star discrep- 
ancy, and to the best of our knowledge, make up the best performing star 
discrepancy estimation algorithms available. 
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A Calculation of A'^(Ae(^)) and 7r'^{Ae{z)) 

Lemma A.l. Let e G (0, 1], and let z G [0, 1]'^ with Vz > e. Then 



fc=0 



(13) 



Proof. Let Vz> £■ Then we have 



«1 J Old 



dCd-dCi 



where 



Q2 



ad 



Z2Z-i.:Zd ClZ?,-Zd ClC2"-Cd-l 

We prove formula (13) by induction over the dimension d. If d = 1, then 
clearly \{A^[z)) = e. Let now d >2. We denote by z the (d— l)-dimensional 
vector (^2, Zd) and by e the term (e + (Ci — zi)Vz)/Ci- Furthermore we de- 
fine for i G [d—l] the lower integration limit cij = (Vj— e)/(C2---C« Zi+i...Zd-i)- 
Note that Qj = Oj+i. Then, by our induction hypothesis, 



AMz)) 



Zd~l 



dCd-dC2dCi 



d-2 



"1 



Vz-{Vz-e)Y^ 



{-ln{l-e/Vz)f 



k=0 



kl 



dCi 



Vz - {Vz -e)- {Vz - e) 



d-l 



k=l 



kl VK-e 



V- \k 

' Ci 



Ci="i 



d-i 



Vz-{Vz-e)Y^ 

k=0 



{-ln{l-£/Vz))' 
k\ 



□ 



Proposition A. 2. Let d>2. For z G [0, 1]'' with Vz > e, we obtain 



d\ 

^ fc=0 



with positive coefficients 



bk{2) 



(fc + l)(fe + 2) 



bk{3) 



{k + 2){k + 3)^^u + 



k 

^ u+l 
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and 

k d-2 ^ 

r/ie power series converges for each e > uniformly and absolutely for all 
Vz ^ [e + e, 1]. Furthermore, we have bo{d) = 1, bi{d) = d{d — l)/2{d + 1), 
and for all k the inequality bj^{d) < d^ /2^~^ is satisfied. 

Proof. To prove the power series expansion, we consider the function 
R{x, d) = l-{l-x)Y, ^~ ^^^^ for [0,1). 

k=0 

Due to Lemma A.l we have X'^^A^^z)) = VzR{e/Vz,d). Since dxR{x,d) = 
(-ln(l - x)Y-^/{d - 1)!, it suffices to prove the following statement by 
induction over d: 



Y,{k + d)bk{d)x^+''-\ (14) 



(d-1)! d\ 

where the power series converges for each e > uniformly and absolutely on 
[0, 1 - e]. Let first d = 2. Then 

- ln(l -x) = ^^ = -J2{k + 2)6fc(2)x^+i , 

k=l ' k=0 

and the required convergence of the power series is obviously given. Now let 
d > 3. Our induction hypothesis yields 

(-ln(l - x))'^-^ 1 (-ln(l-2;))'^-2 



dx- 



{d-iy. 1-x (d-2)! 



^ cx] / k \ 



k=0 ^ /i=0 

where the last power series converges as claimed above. Now 

k k /J. Md-4 d—3 ^ 

d5:(, + d-i)M«-i) = E7rT^E- E U-T^zj^ 

k ui d—2 ^ 

= ^'EE - E n 

1^1+ d — 7 — 1 
= [k + d){k + d - l)bk{d) . 
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After integration we get (14). 

Furthermore, it is easily seen that bo{d) = 1 and bi{d) = d{d—l)/2{d+l). 
To complete the proof, we verify bk{d) < d^ /2^~^ for k > 2. The inequahty 
is obviously true in dimension d = 2. Hence let d > 3. From the identity 

' lk + d-2 



we obtain 

k ui Vd-z d-2 

k 



■I - ^ - V - Yj 1 1 (k + d—2 

^ ^ ■■■ ^ Vi + d-l- j - (d-2)\ 



which leads to 

d{d-l) {k + d-2)...{l + d-2) 



hk{d) < 



{k + d){k + d-l) k...l 



{k + d){k + d-l)\2J ' ^-2^-1 
Corollary A. 3. Let z G [0, l]'^. IjV^ > de, then 



□ 



We now consider the polynomial product measure vr'^. 
Lemma A. 4. Let e G (0, 1], and let z G [0, 1]'^ with Vz > e. Then 

AA{z)) = Vt - {Vz - ef ^(- ln(l - e/Vz)f , (16) 

fc=0 

and, as a function ofVz, 7r^(j4e(z)) is strictly increasing. 
Proof. Let Vz > e. We have 

7r^{A,{z))= [ d''vt^\\dx) = rG{Vz,r)dr, (17) 
JAs{z) Jo 

where 

G{Vz,r) ■.= d''{Vz-rf-'drXHAr{z)). 
From (13) we get for all < r < e 

drX'^iAriz)) = ^ 



{d-iy. 
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If we define 

F{r) := -{V, - vY Y.^^' ^^(l " '^Z^-))' 



k=0 



then we observe that F'{r) = G{Vz,r) holds. Thus we have 'rT'^{Ai;{z)) = 
F{e) — F{0), which proves (16). Furthermore, according to (17), we get 

dvyiAiz)) = r dvM{y^.r)dr. 
Jo 

The integrand of the integral is positive, as the next calculation reveals: 
dvMV.,r) = d\V, - ^)rf^2 (-ln(l-^r/V;))"-^ ^ _ ^^^^ _ ^ ^y^^ _ ^ ^y^^ 

for all < r < V^. Thus dy^ir'^ {Ai.{z)) > 0, and, considered as a function of 
Vz, iT'^{Ag{z)) is strictly increasing. □ 

Proposition A. 5. Let e G (0, 1], and let z G [0, 1]*^ with Vz > e. Then we 
have the lower bound 7r'^{Ai;{z)) > e'^. If furthermore Vz > de, then we have 
the estimate 

Proof. Let Vz=e. Then 



AMz))= I d''vt'\\dx) = \{ 



'[O'-i r=i 

Since t:'^{A^{z)) is an increasing function of V^, the first lower bound holds. 
Let now Vz > de. Here we use the simple estimate 

d'^{Vz - eY-^X^{Ae{z)) < ■K'^{Ae{z)) < d'^Vt^X^{Ae{z)) . 

Together with Proposition A. 2 and Corollary A. 3 this leads to 



e-^^e^ < (1 - l/df-'^^e' < AAiz)) < l^e' . 



□ 



Remark A. 6. Like X'^[Ai;[z)) in Proposition A. 2, one can also expand 
'rT'^{As{z)) into a power series. This leads to 

^^iA,iz)) = -e^Y.'"^{d)' 



dl ^ ' 'Wz 

k=0 ^ 



but here the coefficients ak{d) are not all positive. So we have, e.g., aQ{d) = 
1, but ai{d) = —d{d — l)/2{d + 1). Therefore the power series expansion is 
here less useful than in the situation of Proposition A. 2. 
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